Clustering in R

1 Introduction

This document demonstrates how to perform clustering in R using the tidymodels framework. Clustering is an unsupervised learning technique that groups similar data points together based on their inherent characteristics. We will use the iris dataset for this demonstration.

2 Load Data

First, we load the necessary libraries and the iris dataset.

Code

library(tidyverse)
library(tidymodels)
library(factoextra)

data(iris)
iris_data <- iris %>% select(-Species)

3 K-Means Clustering

K-Means is a popular clustering algorithm. We will use it to group the iris data into 3 clusters.

Code

set.seed(123)
kmeans_model <- kmeans(iris_data, centers = 3, nstart = 25)

# Visualize the clusters
fviz_cluster(kmeans_model, data = iris_data)

4 Hierarchical Clustering

Hierarchical clustering is another common clustering method.

Code

# Calculate the distance matrix
dist_matrix <- dist(iris_data, method = "euclidean")

# Perform hierarchical clustering
hclust_model <- hclust(dist_matrix, method = "ward.D2")

# Visualize the dendrogram
fviz_dend(hclust_model, k = 3, # Cut in 3 groups
          cex = 0.5, # label size
          k_colors = c("#2E9FDF", "#00AFBB", "#E7B800"),
          color_labels_by_k = TRUE, # color labels by groups
          rect = TRUE # Add rectangle around groups
          )

5 Conclusion

This document provided a brief overview of clustering in R using tidymodels. We demonstrated both K-Means and Hierarchical clustering on the iris dataset.

---
title: "Clustering in R"

execute:
  warning: false
  error: false
  eval: false
  
format:
  html:
    toc: true
    toc-location: right
    code-fold: show
    code-tools: true
    number-sections: true
    code-block-bg: true
    code-block-border-left: "#31BAE9"
---

## Introduction

This document demonstrates how to perform clustering in R using the `tidymodels` framework. Clustering is an unsupervised learning technique that groups similar data points together based on their inherent characteristics. We will use the `iris` dataset for this demonstration.

## Load Data

First, we load the necessary libraries and the `iris` dataset.

```{r}
#| label: load-data
#| echo: true

library(tidyverse)
library(tidymodels)
library(factoextra)

data(iris)
iris_data <- iris %>% select(-Species)
```

## K-Means Clustering

K-Means is a popular clustering algorithm. We will use it to group the iris data into 3 clusters.

```{r}
#| label: kmeans
#| echo: true

set.seed(123)
kmeans_model <- kmeans(iris_data, centers = 3, nstart = 25)

# Visualize the clusters
fviz_cluster(kmeans_model, data = iris_data)
```

## Hierarchical Clustering

Hierarchical clustering is another common clustering method.

```{r}
#| label: hclust
#| echo: true

# Calculate the distance matrix
dist_matrix <- dist(iris_data, method = "euclidean")

# Perform hierarchical clustering
hclust_model <- hclust(dist_matrix, method = "ward.D2")

# Visualize the dendrogram
fviz_dend(hclust_model, k = 3, # Cut in 3 groups
          cex = 0.5, # label size
          k_colors = c("#2E9FDF", "#00AFBB", "#E7B800"),
          color_labels_by_k = TRUE, # color labels by groups
          rect = TRUE # Add rectangle around groups
          )
```

## Conclusion

This document provided a brief overview of clustering in R using `tidymodels`. We demonstrated both K-Means and Hierarchical clustering on the `iris` dataset.